Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination visitors stream #129

Merged
merged 11 commits into from
Nov 14, 2023
Merged

Conversation

RushiT0122
Copy link
Contributor

@RushiT0122 RushiT0122 commented Nov 9, 2023

Description of change

Added the the custom pagination support for the Visitors stream TDL-24201

Now visitor records will be synced in the batches of record_limit which defaults to 100,000 instead of syncing them all in one go reducing the memory stress.

Manual QA steps

  • Verify pagination works for visitors only stream with/without include_anonymous_visitors="true"
  • Verify all records are synced as expected when visitors and child stream visitor_history streams are selected
  • Verify implementation works with cloned client connection

Risks

  • If visitor history has lot of data then we may run into memory stress again, so we need to choose optimum 'record_limit' property value.

Rollback steps

  • revert this branch

Comment on lines 739 to 742
if self.name in ["visitors"]:
# If number of records equal to record then assume there are more records to be synced
# and save the last filter value. Otherwise assume we have extracted all records
loop_for_records = self.get_record_count() >= self.record_limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest not following this approach as it creates inverted dependancy, this function belongs to the generic implementation of the stream class, and this condition refers to the name of a child class.

i would suggest that you modify this method in the Visitors class below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can convert this into a generic implementation

@RushiT0122
Copy link
Contributor Author

I have added visitors test data, that will be available for testing tomorrow. Please expect change at base.py#L34 to reduce the daily execution time in CCi.

START_DATE_VISTOR_HISTORY = "2023-03-15T00:00:00Z" --> START_DATE_VISTOR_HISTORY = "2023-11-05T00:00:00Z"

@sgandhi1311 sgandhi1311 merged commit dafe891 into master Nov 14, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants